5,132 research outputs found

    Nonparametric Bayesian Modeling for Automated Database Schema Matching

    Full text link
    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models

    Nonparametric Bayesian methods for one-dimensional diffusion models

    Full text link
    In this paper we review recently developed methods for nonparametric Bayesian inference for one-dimensional diffusion models. We discuss different possible prior distributions, computational issues, and asymptotic results

    Nonparametric Bayesian Inference on Bivariate Extremes

    Full text link
    The tail of a bivariate distribution function in the domain of attraction of a bivariate extreme-value distribution may be approximated by the one of its extreme-value attractor. The extreme-value attractor has margins that belong to a three-parameter family and a dependence structure which is characterised by a spectral measure, that is a probability measure on the unit interval with mean equal to one half. As an alternative to parametric modelling of the spectral measure, we propose an infinite-dimensional model which is at the same time manageable and still dense within the class of spectral measures. Inference is done in a Bayesian framework, using the censored-likelihood approach. In particular, we construct a prior distribution on the class of spectral measures and develop a trans-dimensional Markov chain Monte Carlo algorithm for numerical computations. The method provides a bivariate predictive density which can be used for predicting the extreme outcomes of the bivariate distribution. In a practical perspective, this is useful for computing rare event probabilities and extreme conditional quantiles. The methodology is validated by simulations and applied to a data-set of Danish fire insurance claims.Comment: The paper has been withdrawn by the author due to a major revisio

    A spatiotemporal nonparametric Bayesian model of multi-subject fMRI data

    Get PDF
    In this paper we propose a unified, probabilistically coherent framework for the analysis of task-related brain activity in multi-subject fMRI experiments. This is distinct from two-stage “group analysis” approaches traditionally considered in the fMRI literature, which separate the inference on the individual fMRI time courses from the inference at the population level. In our modeling approach we consider a spatiotemporal linear regression model and specifically account for the between-subjects heterogeneity in neuronal activity via a spatially informed multi-subject nonparametric variable selection prior. For posterior inference, in addition to Markov chain Monte Carlo sampling algorithms, we develop suitable variational Bayes algorithms. We show on simulated data that variational Bayes inference achieves satisfactory results at more reduced computational costs than using MCMC, allowing scalability of our methods. In an application to data collected to assess brain responses to emotional stimuli our method correctly detects activation in visual areas when visual stimuli are presented

    Incremental Learning of Nonparametric Bayesian Mixture Models

    Get PDF
    Clustering is a fundamental task in many vision applications. To date, most clustering algorithms work in a batch setting and training examples must be gathered in a large group before learning can begin. Here we explore incremental clustering, in which data can arrive continuously. We present a novel incremental model-based clustering algorithm based on nonparametric Bayesian methods, which we call Memory Bounded Variational Dirichlet Process (MB-VDP). The number of clusters are determined flexibly by the data and the approach can be used to automatically discover object categories. The computational requirements required to produce model updates are bounded and do not grow with the amount of data processed. The technique is well suited to very large datasets, and we show that our approach outperforms existing online alternatives for learning nonparametric Bayesian mixture models
    corecore